A Statistical Framework for the Analysis of ChIP-Seq Data.

نویسندگان

  • Pei Fen Kuan
  • Dongjun Chung
  • Guangjin Pan
  • James A Thomson
  • Ron Stewart
  • Sündüz Keleş
چکیده

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A statistical framework for power calculations in ChIP-seq experiments

MOTIVATION ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Although the availability of basic analysis tools for ChIP-seq data is rapidly increasing, there has not been much progress on the related design issues. A challenging question for designing a ChIP-seq experiment is how deeply should the ChIP and the contro...

متن کامل

Analysis of ChIP-seq Data with ‘mosaics’ Package

This vignette provides an introduction to the analysis of ChIP-seq data with ‘mosaics’ package. R package mosaics implements MOSAiCS, a statistical framework for the analysis of ChIP-seq data, proposed in [1]. MOSAiCS stands for“MOdel-based one and two Sample Analysis and Inference for ChIP-Seq Data”. Based on careful investigation of biases in ChIP-seq data such as mappability and GC content, ...

متن کامل

Genome analysis A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

Motivation: ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. ‘peak detection’), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to...

متن کامل

ALDEx2: ANOVA-Like Differential Expression tool for compositional data

Fundamentally, many high throughput sequencing approaches generate similar data: reads are mapped to features in each sample, these features are normalized, then statistical difference between the features composing each group or condition is calculated. The standard statistical tools used to analyze RNA-seq, ChIP-seq, 16S rRNA gene sequencing, metagenomics, etc. are fundamentally different for...

متن کامل

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

MOTIVATION ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Statistical Association

دوره 106 495  شماره 

صفحات  -

تاریخ انتشار 2011